15 research outputs found

    Language Modeling with Power Low Rank Ensembles

    Full text link
    We present power low rank ensembles (PLRE), a flexible framework for n-gram language modeling where ensembles of low rank matrices and tensors are used to obtain smoothed probability estimates of words in context. Our method can be understood as a generalization of n-gram modeling to non-integer n, and includes standard techniques such as absolute discounting and Kneser-Ney smoothing as special cases. PLRE training is efficient and our approach outperforms state-of-the-art modified Kneser Ney baselines in terms of perplexity on large corpora as well as on BLEU score in a downstream machine translation task

    Latent-Variable Synchronous CFGs for Hierarchical Translation

    Get PDF
    Data-driven refinement of non-terminal categories has been demonstrated to be a reliable technique for improving mono-lingual parsing with PCFGs. In this pa-per, we extend these techniques to learn latent refinements of single-category syn-chronous grammars, so as to improve translation performance. We compare two estimators for this latent-variable model: one based on EM and the other is a spec-tral algorithm based on the method of mo-ments. We evaluate their performance on a Chinese–English translation task. The re-sults indicate that we can achieve signifi-cant gains over the baseline with both ap-proaches, but in particular the moments-based estimator is both faster and performs better than EM.

    Low-Dimensional Context-Dependent Translation Models

    No full text
    Context matters when modeling language translation, but state-of-the-art approaches predominantly model these dependencies via larger translation units. This decision results in problems related to computational efficiency (runtime and memory) and statistical efficiency (millions of sentences, but billions of translation rules), and as a result such methods stop short of conditioning on extreme amounts of local context or global context. This thesis takes a step back from the current zeitgeist and posits another view: while context influences translation, its influence is inherently low-dimensional, and problems of computational and statistical tractability can be solved by usingdimensionality reduction and representation learning techniques. The lowdimensional representations we recover intuitively capture this observation, that the phenomena that drive translation are controlled by context residing in a morecompact space than the lexical-based (word or n-gram) “one-hot” or count-based spaces. We consider low-dimensional representations of context, recovered via a multiview canonical correlations analysis, as well as low-dimensional representations of translation units that are expressed (featurized) in terms of context, recovered bya rank-reduced SVD of a feature space defined over inside and outside trees in a synchronous grammar. Lastly, we test our low-dimensional hypothesis in the limit, by considering a semi-supervised learning scenario where contextual information is gleaned from large amounts of unlabeled data. All empirical setups show improvementsby taking into account the low-dimensional hypothesis, indicating that this route is an effective way to boost performance while maintaining model parsimony
    corecore